Pesquisa | BVS - MINISTÉRIO DA SAÚDE

PVR-Vocoder: A Pathological Voice Repair Vocoder for Voice Disorders.

Liu, Ganjun; Zhang, Tao; Liu, Xiaonan; Hou, Xiaohui; Ding, Biyun; Fu, Dehui; Pang, Zhibo.

IEEE J Biomed Health Inform ; PP2023 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-38090823

RESUMO

Vocoder-based speech synthesis has become a promising technique to accommodate the demands of high-quality speech analysis, manipulation, and synthesis. However, most existing works focus on how to synthesize normal human voice with high signal-to-noise ratio, neglecting individuals' pathological voice disorder in speech interaction. In this work, we propose a non-linear voice repair vocoder for pathological vowels and sentences, which takes the pathological speech as input and generates high-quality repaired speech. Our approach is specifically designed to enhance the speech quality and intelligibility for individuals with voice disorders. We employ amplitude modulated-frequency modulated (AM-FM) and Teager energy operation techniques to enhance the quality of pitch and spectral envelope. To tackle the instability and fracture problem of pitch, we present spectral tracking algorithm, which not only avoids dramatic change in the edge of voice, but also reduces the errors of half-pitch. Furthermore, we design a spectral reconstruction algorithm, which can effectively rebuild the spectral structure by energy operation to accomplish spectral envelope repair. The proposed PVR-Vocoder shows exceptional performance in pathological voice intelligibility enhancement according to various quality measures including objective indicators, subjective evaluation, and spectrum observations.

PVR-AFM: A Pathological Voice Repair System based on Non-linear Structure.

Zhang, Tao; Liu, Xiaonan; Liu, Ganjun; Shao, Yangyang.

J Voice ; 37(5): 648-662, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37717981

RESUMO

OBJECTIVE: Speech signal processing has become an important technique to ensure that the voice interaction system communicates accurately with the user by improving the clarity or intelligibility of speech signals. However, most existing works only focus on whether to process the voice of average human but ignore the communication needs of individuals suffering from voice disorder, including voice-related professionals, older people, and smokers. To solve this demand, it is essential to design a non-invasive repair system that processes pathological voices. METHODS: In this paper, we propose a repair system for multiple polyp vowels, such as /a/, /i/ and /u/. We utilize a non-linear model based on amplitude-modulation (AM) and a frequency-modulation (FM) structure to extract the pitch and formant of pathological voice. To solve the fracture and instability of pitch, we provide a pitch extraction algorithm, which ensures that pitch's stability and avoids the errors of double pitch caused by the instability of low-frequency signal. Furthermore, we design a formant reconstruction mechanism, which can effectively determine the frequency and bandwidth to accomplish formant repair. RESULTS: Finally, spectrum observation and objective indicators show that the system has better performance in improving the intelligibility of pathological speech.

Assuntos

Distúrbios da Voz , Voz , Humanos , Idoso , Fala , Distúrbios da Voz/diagnóstico , Algoritmos , Cognição

GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder.

Liu, Ganjun; Zhang, Tao; Ding, Biyun; Lv, Ying; Hou, Xiaohui; Guo, Haoyang; Wu, Yaqin; Fu, Dehui.

J Voice ; 2023 May 09.

Artigo em Inglês | MEDLINE | ID: mdl-37169702

RESUMO

OBJECTIVE: Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises. METHODS: In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Section method, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech. RESULTS: Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.

Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder.

Zhang, Tao; Shao, Yangyang; Wu, Yaqin; Pang, Zhibo; Liu, Ganjun.

IEEE J Biomed Health Inform ; 24(7): 1940-1951, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32149701

RESUMO

Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrücken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.

Assuntos

Espectrografia do Som/métodos , Distúrbios da Voz/diagnóstico , Voz/fisiologia , Análise de Ondaletas , Idoso , Humanos , Redes Neurais de Computação

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA